88 research outputs found

    Data Reduction Techniques for Sensor Networks

    Get PDF
    We are inevitably moving into a realm where small and inexpensive wireless devices would be seamlessly embedded in the physical world and form a wireless sensor network in order to perform complex monitoring and computational tasks. Such networks pose new challenges in data processing and dissemination due to the conflict between (i) the abundance of information that can be collected and processed in a distributed fashion among thousands of nodes and (ii) the limited resources (bandwidth, energy) that such devices possess. In this paper we propose a new data reduction technique that exploits the correlation and redundancy among multiple measurements on the same sensor and achieves high degree of data reduction while managing to capture even the smallest details of the recorded measurements. The key to our technique is the base signal, a series of values extracted from the real measurements, used for encoding piece-wise linear correlations among the collected data values. We provide efficient algorithms for extracting the base signal features from the data and for encoding the measurements using these features. Our experiments demonstrate that our method by far outperforms standard approximation techniques like Wavelets, Histograms and the Discrete Cosine Transform, on a variety of error metrics and for real datasets from different domains. (UMIACS-TR-2003-80

    Towards Analytics Aware Ontology Based Access to Static and Streaming Data (Extended Version)

    Full text link
    Real-time analytics that requires integration and aggregation of heterogeneous and distributed streaming and static data is a typical task in many industrial scenarios such as diagnostics of turbines in Siemens. OBDA approach has a great potential to facilitate such tasks; however, it has a number of limitations in dealing with analytics that restrict its use in important industrial applications. Based on our experience with Siemens, we argue that in order to overcome those limitations OBDA should be extended and become analytics, source, and cost aware. In this work we propose such an extension. In particular, we propose an ontology, mapping, and query language for OBDA, where aggregate and other analytical functions are first class citizens. Moreover, we develop query optimisation techniques that allow to efficiently process analytical tasks over static and streaming data. We implement our approach in a system and evaluate our system with Siemens turbine data

    Outlier-Aware Data Aggregation in Sensor Networks

    Full text link
    Abstract- In this paper we discuss a robust aggregation framework that can detect spurious measurements and refrain from incorporating them in the computed aggregate values. Our framework can consider different definitions of an outlier node, based on a specified minimum support. Our experimental evaluation demonstrates the benefits of our approach. I

    Another Outlier Bites the Dust: Computing Meaningful Aggregates in Sensor Networks

    Full text link
    Abstract — Recent work has demonstrated that readings pro-vided by commodity sensor nodes are often of poor quality. In order to provide a valuable sensory infrastructure for monitoring applications, we first need to devise techniques that can withstand “dirty ” and unreliable data during query processing. In this paper we present a novel aggregation framework that detects suspicious measurements by outlier nodes and refrains from incorporating such measurements in the computed aggregate values. We consider different definitions of an outlier node, based on the notion of a user-specified minimum support, and discuss techniques for properly routing messages in the network in order to reduce the bandwidth consumption and the energy drain during the query evaluation. In our experiments using real and synthetic traces we demonstrate that: (i) a straightfor-ward evaluation of a user aggregate query leads to practically meaningless results due to the existence of outliers; (ii) our techniques can detect and eliminate spurious readings without any application specific knowledge of what constitutes normal behavior; (iii) the identification of outliers, when performed inside the network, significantly reduces bandwidth and energy drain compared to alternative methods that centrally collect and analyze all sensory data; and (iv) we can significantly reduce the cost of the aggregation process by utilizing simple statistics on outlier nodes and reorganizing accordingly the collection tree. I

    DCC&U: An Extended Digital Curation Lifecycle Model

    Get PDF
    The proliferation of Web, database and social networking technologies has enabled us to produce, publish and exchange digital assets at an enormous rate. This vast amount of information that is either digitized or born-digital needs to be collected, organized and preserved in a way that ensures that our digital assets and the information they carry remain available for future use. Digital curation has emerged as a new inter-disciplinary practice that seeks to set guidelines for disciplined management of information. In this paper we review two recent models for digital curation introduced by the Digital Curation Centre (DCC) and the Digital Curation Unit (DCU) of the Athena Research Centre. We then propose a fusion of the two models that highlights the need to extend the digital curation lifecycle by adding (a) provisions for the registration of usage experience, (b) a stage for knowledge enhancement and (c) controlled vocabularies used by convention to denote concepts, properties and relations. The objective of the proposed extensions is twofold: (i) to provide a more complete lifecycle model for the digital curation domain; and (ii) to provide a stimulus for a broader discussion on the research agenda

    Processing Proximity Queries in Sensor Networks

    No full text
    Sensor networks are often used to perform monitoring tasks, such as in animal or vehicle tracking and in surveillance of enemy forces in military applications. In this paper we introduce the concept of proximity queries that allow us to report interesting events that are observed by nodes in the network that are within certain distance of each other. An event is triggered when a userprogrammable predicate is satisfied on a sensor node. We study the problem of computing proximity queries in sensor networks using existing communication protocols and then propose an efficient algorithm that can process multiple proximity queries, involving several different event types. Our solution utilizes a distributed routing index, maintained by the nodes in the network that is dynamically updated as new observations are obtained by the nodes. We present an extensive experimental study to show the benefits of our techniques under different scenarios. Our results demonstrate that our algorithms scale better and require orders of magnitude fewer messages compared to a straightforward computation of the queries

    A Generalized Framework for Indexing OLAP Aggregates

    No full text
    Decision support applications often require fast response time to a wide variety of aggregate queries extracted from huge amounts of data. In this paper we propose the use of well organized packed R-trees for storing and maintaining multidimensional aggregates. Moreover, we present a general framework for mapping OLAP data to a collection of R-trees that achieve a high degree of data clustering with very low space overhead. We then propose four different allocation strategies designed to optimize different application needs. On the second part of the paper we present experimental results on high dimensionality OLAP data (up to 10 dimensions) of realistic size. Finally we characterize the performance of the proposed allocation strategies with respect to both incremental updates and response time for a variety of different queries. (Also cross-referenced as UMIACS-TR-97-76

    Extending the Data Warehouse for Service Provisioning Data

    No full text
    The last few years, there has been an extensive body of literature in data warehousing applications that primarily focuses on basket-type (transactional) data, common in retail industries. In this paper we focus on service provisioning data, that is data that is recorded internally in an organization for provisioning certain business related tasks. Coupling the recorded data with the underlying process and business-practice(s) that generate them is crucial for end-to-end analysis. Our framework is based on a graph description of the process (called a sketch) that is generating this data. Using this sketch, we formalize a new class of aggregate queries that consolidate data from a part of the process, based on a user defined path expression. We then show how to build a compact, non-redundant collection of summary (aggregate) tables and indices for this new type of queries. We first explore how to select a minimum set of views to answer queries with pathexpressions over the given sketch. For queries that also include aggregation, we define two partial orders among the views. The first is used to pick the minimum set of aggregate views to answer any query with no false dismissals, while the second describes an augmented set that allows fewer false positives. Computing a non-materialized aggregate is done through appropriate rewriting of the user query. We describe two indexing schemes that use phantom (non-materialized) aggregate values to expedite query processing. Experimental results show these schemes to perform well on synthetic and real datasets.

    An alternative storage organization for ROLAP aggregate views based on cubetrees

    No full text
    as the dominant approach in data warehousing with decision support applications. In order to enhance query performance, the RO-LAP approach relies on selecting and materializing in summary tables appropriate subsets of aggregate views which are then engaged in speeding up OLAP queries. However, a straight forward relational storage implementation of materialized ROLAP views is immensely wasteful on storage and incredibly inadequate on query performance and incremental update speed. In this paper we propose the use of Cubetrees, a collection of packed and compressed R-trees, as an alternative storage and index organization for ROLAP views and provide an efficient algorithm for mapping an arbitrary set of OLAP views to a collection of Cubetrees that achieve excellent performance. Compared to a conventional (relational) storage organization of materialized OLAP views, Cubetrees offer at least a 2-1 storage reduction, a 10-1 better OLAP query performance, and a 100-1 faster updates. We compare the two alternative approaches with data generated from the TPC-D benchmark and stored in the Informix Universal Server (IUS). The straight forward implementation materializes the ROLAP views using IUS tables and conventional B-tree indexing. The Cubetree implementation materializes the same ROLAP views using a Cubetree Datablade developed for IUS. The experiments demonstrate that the Cubetree storage organization is superior in storage, query performance and update speed.

    Snapshot Queries: Towards Data-Centric Sensor Networks

    No full text
    In this paper we introduce the idea of snapshot queries for energy efficient data acquisition in sensor networks. Network nodes generate models of their surrounding environment that are used for electing, using a localized algorithm, a small set of representative nodes in the network. These representative nodes constitute a network snapshot and can be used to provide quick approximate answers to user queries while reducing substantially the energy consumption in the network. We present a detailed experimental study of our framework and algorithms, varying multiple parameters like the available memory of the sensor nodes, their transmission range, the network message loss etc. Depending on the configuration, snapshot queries provide a reduction of up to 90% in the number of nodes that need to participate in a user query
    corecore